Data Engineering

Forum Posts

Sorted by:

Start a conversation

by User16826992945 • Databricks Employee

06-23-2021 7:11:30 AM

8163 Views
2 replies
0 kudos

Why is the current limit for rows returned in Databricks SQL 64K?

Data Engineering

8163 Views
2 replies
0 kudos

06-23-2021 7:11:30 AM

View Replies

Latest Reply

prasadvaze
Valued Contributor II

05-11-2022 1:42:13 PM

0 kudos

@Amine El Helou when this limit will be lifted? my users won't switch from SSMS to dbx sql unless they can see all results set in UI ( which sometimes is more than 10mm rows)

0 kudos

05-11-2022 1:42:13 PM

1 More Replies

by dataslicer • Contributor

04-14-2022 3:29:53 PM

8594 Views
6 replies
2 kudos

Resolved! Exploring additional cost saving options for structured streaming 24x7x365 uptime workloads

I currently have multiple jobs (each running its own job cluster) for my spark structured streaming pipelines that are long running 24x7x365 on DBR 9.x/10.x LTS. My SLAs are 24x7x365 with 1 minute latency. I have already accomplished the following co...

Data Engineering

8594 Views
6 replies
2 kudos

04-14-2022 3:29:53 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-14-2022 4:11:05 PM

2 kudos

Autoscaling doesn't work with structured streaming, so that's not really an option. Autoscaling is based on jobs sitting in the jobs queue for a long time, but that's not the case with streaming. Streaming is more many frequent small jobs. Spot in...

2 kudos

04-14-2022 4:11:05 PM

5 More Replies

by Ben_Spark • New Contributor III

04-14-2022 3:11:54 AM

9586 Views
4 replies
2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

Data Engineering

9586 Views
4 replies
2 kudos

04-14-2022 3:11:54 AM

View Replies

Latest Reply

Ben_Spark
New Contributor III

05-11-2022 6:34:54 AM

2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

2 kudos

05-11-2022 6:34:54 AM

3 More Replies

by CrisBerg_65149 • New Contributor III

04-14-2022 2:34:45 AM

6492 Views
4 replies
6 kudos

Resolved! SELECT * FROM delta doesn't work on Spark 3.2

Using DBR 10 or later and I’m getting an error when running the following querySELECT * FROM delta.`s3://some_path`getting org.apache.spark.SparkException: Unable to fetch tables of db deltaFor 3.2.0+ they recommend reading like this:CREATE TEMPORAR...

Data Engineering

6492 Views
4 replies
6 kudos

04-14-2022 2:34:45 AM

View Replies

Latest Reply

CrisBerg_65149
New Contributor III

05-11-2022 5:46:31 AM

6 kudos

Got support from Databricks.Unfortunately, someone created a DB called delta, so the query was done against that DB instead. Issue was solved

6 kudos

05-11-2022 5:46:31 AM

3 More Replies

by Gerhard • New Contributor III

01-26-2022 12:46:50 AM

9718 Views
9 replies
5 kudos

Overall security/access rights concept needed (combine Table Access Control and Credential Passthrough), how to allow users the benefits of both worlds

What we have:Databricks Workspace Premium on AzureADLS Gen2 storage for raw data, processed data (tables) and files like CSV, models, etc.What we want to do:We have users that want to work on Databricks to create and work with Python algorithms. We d...

Data Engineering

9718 Views
9 replies
5 kudos

01-26-2022 12:46:50 AM

View Replies

Latest Reply

Gerhard
New Contributor III

05-10-2022 11:21:34 PM

5 kudos

Hey @Vartika Nain , we are still at the same situation as described above. The Hive Metastore is a weak point.I would love to have the functionality that a mount can be dedicated to a given cluster.Regards, Gerhard

5 kudos

05-10-2022 11:21:34 PM

8 More Replies

by Reza • New Contributor III

04-09-2022 1:41:25 PM

8638 Views
4 replies
4 kudos

Datepicker widget

There are textbox and dropdown list widgets in Databricks. Is there any datepicker widget? If not, is there any plan to add it?

Data Engineering

8638 Views
4 replies
4 kudos

04-09-2022 1:41:25 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-11-2022 1:21:47 PM

4 kudos

@Reza Rajabi , No, it is not I think at some meeting, someone discussed it. We can ask about it during the following office hours https://databricks.com/p/webinar/databricks-office-hours?utm_source=databricks&utm_medium=email&utm_campaign=7013f0000...

4 kudos

04-11-2022 1:21:47 PM

3 More Replies

by laus • New Contributor III

03-31-2022 9:35:11 AM

26143 Views
3 replies
2 kudos

Resolved! get a "Py4JJavaError: An error occurred while calling o5082.csv." when trying to save to csv file.

Hi, I'm trying to save a dataframe to csv with the code below:output.coalesce(1).write.mode('overwrite').option('header', 'true').csv(tmp_file_path) But it get "Py4JJavaError: An error occurred while calling o5082.csv." error. Any idea how to solve...

Data Engineering

26143 Views
3 replies
2 kudos

03-31-2022 9:35:11 AM

View Replies

by Rahul_Samant • Contributor

03-14-2022 3:55:28 AM

16172 Views
4 replies
4 kudos

Resolved! Bucketing on Delta Tables

getting error as below while creating buckets on delta table.Error in SQL statement: AnalysisException: Delta bucketed tables are not supported.have fall back to parquet table due to this for some use cases. is their any alternative for this. i have...

Data Engineering

16172 Views
4 replies
4 kudos

03-14-2022 3:55:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-10-2022 5:57:58 AM

4 kudos

Hi @Rahul Samant , we checked internally on this due to certain limitations bucketing is not supported on delta tables, the only alternative for bucketing is to leverage the z ordering, below is the link for reference https://docs.databricks.com/de...

4 kudos

05-10-2022 5:57:58 AM

3 More Replies

by Michael_Galli • Databricks Partner

05-06-2022 4:19:28 AM

6766 Views
3 replies
2 kudos

Resolved! Spark Streaming - only process new files in streaming path?

In our streaming jobs, we currently run streaming (cloudFiles format) on a directory with sales transactions coming every 5 minutes.In this directory, the transactions are ordered in the following format:<streaming-checkpoint-root>/<transaction_date>...

Data Engineering

6766 Views
3 replies
2 kudos

05-06-2022 4:19:28 AM

View Replies

Latest Reply

Michael_Galli
Databricks Partner

05-09-2022 11:00:26 PM

2 kudos

Update:Seems that maxFileAge was not a good idea. The following with the option "includeExistingFiles" = False solved my problem:streaming_df = ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", extension) .option("...

2 kudos

05-09-2022 11:00:26 PM

2 More Replies

by AvijitDey • New Contributor III

05-02-2022 11:35:42 AM

6786 Views
3 replies
4 kudos

Resolved! Azure Databrick SQL bulk insert to AZ SQL

Env: Azure Databrick :version : 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)Work Type : 56 GB Memory 2-8 node ( standard D13_V2)No of rows : 2470350 and 115 Column Size : 2.2 GBTime taken approx. 9 min Python Code .What will be best approach for...

Data Engineering

6786 Views
3 replies
4 kudos

05-02-2022 11:35:42 AM

View Replies

Latest Reply

AvijitDey
New Contributor III

05-09-2022 6:59:22 AM

4 kudos

Any further suggestion

4 kudos

05-09-2022 6:59:22 AM

2 More Replies

by reedzhang • New Contributor III

03-23-2022 9:15:20 AM

6046 Views
4 replies
3 kudos

Resolved! uninstalled libraries continue to get installed on cluster startup

We have been trying to update some library versions by uninstalling the old versions and installing new ones. However, the old libraries continue to get installed on cluster startup despite not showing up in the "libraries" tab of the cluster page. W...

Data Engineering

6046 Views
4 replies
3 kudos

03-23-2022 9:15:20 AM

View Replies

Latest Reply

reedzhang
New Contributor III

05-08-2022 3:20:20 PM

3 kudos

The issue seemed to go away on its own. At some point the libraries page started showing what was getting installed to the cluster, and removing libraries from the page caused them to stop getting installed on cluster startup. I'm guessing there was ...

3 kudos

05-08-2022 3:20:20 PM

3 More Replies

by tomnguyen_195 • New Contributor III

05-06-2022 2:15:01 AM

4949 Views
4 replies
7 kudos

Resolved! Increase input rate in Delta Live Tables

Hi,I need to ingest 60 millions json files from S3 and have create a Delta Live Tables to ingest these data to delta table with Auto Loader. However the input rate in my DLT is always around 8 records/second no matter how many worker I add to the DLT...

Data Engineering

4949 Views
4 replies
7 kudos

05-06-2022 2:15:01 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

05-06-2022 9:01:51 AM

7 kudos

Please consider the following:consider having driver 2 times bigger than worker,check is S3 in the same region, is communicating via the private gateway (local IPs),enable S3 transfer acceleration,in ingestion please user autoloader as described here...

7 kudos

05-06-2022 9:01:51 AM

3 More Replies

by Bill • New Contributor III

05-02-2022 6:37:20 AM

3578 Views
5 replies
2 kudos

Resolved! How to access tables created in 2017

In 2017 while working on my Masters degree, I created some tables that I would like to access again. Back then I could just write SQL and find them but today that doesn't work. I suspect it has something to do with Delta Lake. What do I have to do to...

Data Engineering

3578 Views
5 replies
2 kudos

05-02-2022 6:37:20 AM

View Replies

Latest Reply

Bill
New Contributor III

05-07-2022 7:01:35 AM

2 kudos

That did it. Thanks

2 kudos

05-07-2022 7:01:35 AM

4 More Replies

by Hubert-Dudek • Databricks MVP

05-07-2022 3:34:11 AM

1399 Views
0 replies
20 kudos

From databricks runtime 10.5 ARRAY_SIZE function was added.

Data Engineering

1399 Views
0 replies
20 kudos

05-07-2022 3:34:11 AM

by Anonymous • Not applicable

05-06-2022 2:53:17 PM

2020 Views
1 replies
1 kudos

Resolved! Unable to start cluster on E2 Workspace

Hello Community,I'm trying to create and start my first cluster on my E2 Databricks Workspace on AWS; however, the cluster is created but after STARTING the cluster immediately the cluster status goes to TERMINATING. Logs provided by Databricks show ...

Data Engineering

2020 Views
1 replies
1 kudos

05-06-2022 2:53:17 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-06-2022 4:06:56 PM

1 kudos

Update:It was an error on my side with the KMS key.

1 kudos

05-06-2022 4:06:56 PM

Databricks Community

Forum Posts

Why is the current limit for rows returned in Databricks SQL 64K?

Resolved! Exploring additional cost saving options for structured streaming 24x7x365 uptime workloads

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

Resolved! SELECT * FROM delta doesn't work on Spark 3.2

Overall security/access rights concept needed (combine Table Access Control and Credential Passthrough), how to allow users the benefits of both worlds

Datepicker widget

Resolved! get a "Py4JJavaError: An error occurred while calling o5082.csv." when trying to save to csv file.

Resolved! Bucketing on Delta Tables

Resolved! Spark Streaming - only process new files in streaming path?

Resolved! Azure Databrick SQL bulk insert to AZ SQL

Resolved! uninstalled libraries continue to get installed on cluster startup

Resolved! Increase input rate in Delta Live Tables

Resolved! How to access tables created in 2017

From databricks runtime 10.5 ARRAY_SIZE function was added.

Resolved! Unable to start cluster on E2 Workspace

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template