Data Engineering

Forum Posts

Sorted by:

by Transcarent • New Contributor II

02-18-2022 4:49:47 PM

8194 Views
6 replies
0 kudos

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused b...

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

Data Engineering

8194 Views
6 replies
0 kudos

02-18-2022 4:49:47 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-05-2022 4:28:43 PM

0 kudos

hi @prakash reddy ,Is this an intermittent error or are you able to repro it? please let us know.

0 kudos

04-05-2022 4:28:43 PM

5 More Replies

by s_plank • New Contributor III

03-01-2022 4:13:20 AM

5677 Views
6 replies
5 kudos

Resolved! Databricks-Connect shows different partitions than Databricks for the same delta table

Hello,here is a small code-snippet:from pyspark.sql import SparkSession spark = SparkSession.builder.appName('example_app').getOrCreate() spark.sql('SHOW PARTITIONS database.table').show() The output inside the Databricks-Notebook:+-------------+--...

Data Engineering

5677 Views
6 replies
5 kudos

03-01-2022 4:13:20 AM

View Replies

Latest Reply

s_plank
New Contributor III

04-05-2022 11:16:48 PM

5 kudos

Hi @Jose Gonzalez ,yes the SQL-Connector works fine. Thank you!

5 kudos

04-05-2022 11:16:48 PM

5 More Replies

by Krishscientist • New Contributor III

04-08-2022 4:48:14 PM

1809 Views
1 replies
0 kudos

Resolved! AutoML : data set for problem type "Classification"

HI,I am working on AutoML Experiment. Could you plz help me with data set for problem type "Classification"Regards.

Data Engineering

1809 Views
1 replies
0 kudos

04-08-2022 4:48:14 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2022 4:18:05 PM

0 kudos

There are a lot of datasets available in /databricks-datasets/ that you can look through. You'll have to turn them into a table so that you can access them in automl. There are datasets associated with the spark definitive guide and learning spark ...

0 kudos

04-09-2022 4:18:05 PM

by Rex • New Contributor III

03-23-2022 1:22:30 PM

7341 Views
4 replies
3 kudos

Resolved! Cannot use prepared statements with date functions

We are using PHP and the Databricks SQL ODBC driver and cannot run a query that users DATE functions with prepared statements. Sample script/docker setup here:https://github.com/rlorenzo/databricks_php/blob/odbc_prepare_error/test_connection.phpFor e...

Data Engineering

7341 Views
4 replies
3 kudos

03-23-2022 1:22:30 PM

View Replies

Latest Reply

Rex
New Contributor III

04-01-2022 11:10:40 AM

3 kudos

@Bilal Aslam We tried CAST and CONVERT and still getting the same error.

3 kudos

04-01-2022 11:10:40 AM

3 More Replies

by TS • New Contributor III

04-08-2022 10:57:23 AM

1059 Views
0 replies
1 kudos

Is there a better way for this matching?

I have an array:var arg = condColumnsKeyswith the elementsarg: Array[String] = Array(LOT_PREFIX, PS_NAME_BOOK_TEMPLATE_NAME, PS_NAME_PAGE_NAME, PS_NAME_FIELD_NAME)Desired outcome is to get the string "LOT_PREFIX" and store it in var ccLotPrefixMy fir...

Data Engineering

1059 Views
0 replies
1 kudos

04-08-2022 10:57:23 AM

by Taha_Hussain • Databricks Employee

04-07-2022 5:03:09 PM

1858 Views
1 replies
1 kudos

Databricks Office Hours Our next Office Hours session is scheduled for April 27 2022 - 8:00 am PT. Do you have questions about how to set up or use Da...

Databricks Office HoursOur next Office Hours session is scheduled for April 27 2022 - 8:00 am PT.Do you have questions about how to set up or use Databricks? Do you want to learn more about the best practices for deploying your use case or tips on da...

Data Engineering

1858 Views
1 replies
1 kudos

04-07-2022 5:03:09 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-08-2022 9:29:53 AM

1 kudos

Just registered. Thank you and happy weekend.

1 kudos

04-08-2022 9:29:53 AM

by StephanieAlba • Databricks Employee

04-08-2022 9:10:51 AM

3769 Views
1 replies
6 kudos

Resolved! Is it possible to use Autoloader with a daily update file structure?

We get new files from a third-p@rty each day. The files could be the same or different. However, each day all csv files arrive in the same dated folder. Is it possible to use autoloader on this structure?We want each csv file to be a table that gets ...

Data Engineering

3769 Views
1 replies
6 kudos

04-08-2022 9:10:51 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-08-2022 9:25:19 AM

6 kudos

@Stephanie Rivera , You can use pathGlobfilter, but you will need a separate autoloader for which type of file.df_alert = spark.readStream.format("cloudFiles") \.option("cloudFiles.format", "binaryFile") \.option("pathGlobfilter", alert.csv") \.load...

6 kudos

04-08-2022 9:25:19 AM

by User16835756816 • Databricks Employee

04-05-2022 1:54:26 PM

3023 Views
1 replies
5 kudos

Announcing: Delta Live Tables !

Databricks is excited to announce the general availability of Delta Live Tables to you, our community. Anxiously awaited, Delta Live Tables (DLT) is the first ETL framework that uses a simple, declarative approach to building reliable streaming or ...

Data Engineering

3023 Views
1 replies
5 kudos

04-05-2022 1:54:26 PM

View Replies

Latest Reply

User16725394280
Databricks Employee

04-08-2022 4:13:53 AM

5 kudos

Informative Content thanks for sharing.

5 kudos

04-08-2022 4:13:53 AM

by Kush22 • New Contributor

04-07-2022 6:34:59 AM

2514 Views
0 replies
0 kudos

Delete the file

While exporting data from Databricks to Azure blob storage how can I delete the committed, started and success file?

Data Engineering

2514 Views
0 replies
0 kudos

04-07-2022 6:34:59 AM

by sgannavaram • New Contributor III

04-07-2022 5:53:37 AM

4850 Views
1 replies
2 kudos

Resolved! How to pass variables into query string?

I have two variables StartTimeStmp and EndTimeStmp, i am going to assign the Start timestamp to it based on Last Successful Job Runtime and EndTimeStamp would be current time of system.SET StartTimeStmp = '2022-03-24 15:40:00.000';SET EndTimeStmp = '...

Data Engineering

4850 Views
1 replies
2 kudos

04-07-2022 5:53:37 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-07-2022 6:05:46 AM

2 kudos

@Srinivas Gannavaram , in python:spark.sql(f""" SELECT CI.CORPORATE_ITEM_INTEGRATION_ID , CI.CORPORATE_ITEM_CD WHERE CI.DW_CREATE_TS < '{my_timestamp_variable}' ; """)

2 kudos

04-07-2022 6:05:46 AM

by Direo • Contributor II

04-07-2022 5:06:11 AM

16114 Views
2 replies
3 kudos

Resolved! How temporary is dbfs:/tmp/? Are files periodically deleted from there?

Data Engineering

16114 Views
2 replies
3 kudos

04-07-2022 5:06:11 AM

View Replies

Latest Reply

User16873043212
Databricks Employee

04-07-2022 6:05:04 AM

3 kudos

@Direo Direo , Yeah, this is a location inside your dbfs. The whole control is on you. Databricks do not delete something you keep in this location.

3 kudos

04-07-2022 6:05:04 AM

1 More Replies

by Direo • Contributor II

04-07-2022 5:09:31 AM

2605 Views
1 replies
5 kudos

Is it possible to write tables to delta lake using upsert mode? Would it be more efficiant than overwrite?

Data Engineering

2605 Views
1 replies
5 kudos

04-07-2022 5:09:31 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-07-2022 5:16:25 AM

5 kudos

@Direo Direo , Yes, you use Merge syntax for that https://docs.delta.io/latest/delta-update.html.And is more efficient than overwriting if you want to update only part of the data, but you need to think about the logic of what to update so overwriti...

5 kudos

04-07-2022 5:16:25 AM

by Constantine • Contributor III

04-06-2022 3:47:43 PM

2336 Views
1 replies
4 kudos

Resolved! What's the best architecture for Structured Streaming and why?

I am building an ETL pipeline which reads data from a Kafka topic ( data is serialized in Thrift format) and writes it to Delta Table in databricks. I want to have two layersBronze Layer -> which has raw Kafka dataSilver Layer -> which has deserializ...

Data Engineering

2336 Views
1 replies
4 kudos

04-06-2022 3:47:43 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-07-2022 3:15:43 AM

4 kudos

@John Constantine , "Bronze Layer -> which has raw Kafka data"If you use confluent.io, you can also utilize a direct sink to DataLake Storage - bronze layer."Silver Layer -> which has deserialized data"Then use Delta Live Tables to process it to del...

4 kudos

04-07-2022 3:15:43 AM

by cal • New Contributor

04-06-2022 6:37:22 PM

803 Views
0 replies
0 kudos

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipme...

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipment manufacturers, plumbing and HVAC industries. In today's fast paced market, consumers have a multi...

Data Engineering

803 Views
0 replies
0 kudos

04-06-2022 6:37:22 PM

by Anonymous • Not applicable

04-06-2022 7:10:09 AM

2514 Views
1 replies
1 kudos

Resolved! "policy_id" parameter in JOB API

I can't find information about that parameter in https://docs.databricks.com/dev-tools/api/latest/jobs.htmlWhere is it documented?

Data Engineering

2514 Views
1 replies
1 kudos

04-06-2022 7:10:09 AM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

04-06-2022 2:32:32 PM

1 kudos

I believe it is just "policy_id". As an incomplete example the specification via API would be something like: { "cluster_id": "1234-567890-abd35gh", "spark_context_id": 1234567890, "cluster_name": "my_cluster", "spark_version": "9.1.x-scala2....

1 kudos

04-06-2022 2:32:32 PM

Databricks Community

Forum Posts

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused b...

Resolved! Databricks-Connect shows different partitions than Databricks for the same delta table

Resolved! AutoML : data set for problem type "Classification"

Resolved! Cannot use prepared statements with date functions

Is there a better way for this matching?

Databricks Office Hours Our next Office Hours session is scheduled for April 27 2022 - 8:00 am PT. Do you have questions about how to set up or use Da...

Resolved! Is it possible to use Autoloader with a daily update file structure?

Announcing: Delta Live Tables !

Delete the file

Resolved! How to pass variables into query string?

Resolved! How temporary is dbfs:/tmp/? Are files periodically deleted from there?

Is it possible to write tables to delta lake using upsert mode? Would it be more efficiant than overwrite?

Resolved! What's the best architecture for Structured Streaming and why?

G.I.S., Inc. is a distributor and fabricator of thermal and acoustical insulation systems for industrial, commercial, power, process, original equipme...

Resolved! "policy_id" parameter in JOB API

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template