Data Engineering

Forum Posts

Sorted by:

by ASN • New Contributor II

06-09-2022 2:39:24 PM

7343 Views
5 replies
2 kudos

Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator

I have data like, below and when reading as CSV, I don't want to consider comma when its within the quotes even if the quotes are not immediate to the separator (like record #2). 1 and 3 records are good if we use separator, but failing on 2nd record...

Data Engineering

7343 Views
5 replies
2 kudos

06-09-2022 2:39:24 PM

View Replies

Latest Reply

Pholo
Contributor

08-04-2022 7:06:15 AM

2 kudos

Hi, I think you can use this option for the csvReadeespark.read.options(header = True, sep = ",", unescapedQuoteHandling = "BACK_TO_DELIMITER").csv("your_file.csv")especially the unescapedQuoteHandling. You can search for the other options at this l...

2 kudos

08-04-2022 7:06:15 AM

4 More Replies

by Rahul_Samant • Contributor

08-03-2022 2:03:06 AM

2253 Views
4 replies
1 kudos

Resolved! Spark Sql Connector :

i am trying to read data from azure sql database from databricks. azure sql database is created with private link endpoint.Using DBR 10.4 LTS Cluster and expectation is the connector is pre installed as per documentation.using the below code to fetch...

Data Engineering

2253 Views
4 replies
1 kudos

08-03-2022 2:03:06 AM

View Replies

Latest Reply

artsheiko
Valued Contributor III

08-03-2022 11:11:03 AM

1 kudos

It seems that .option("databaseName", "test") is redundant here as you need to include the db name in the url.Please verify that you use a connector compatible to your cluster's Spark version : Apache Spark connector: SQL Server & Azure SQL

1 kudos

08-03-2022 11:11:03 AM

3 More Replies

by Anonymous • Not applicable

03-16-2022 11:03:55 AM

598 Views
1 replies
3 kudos

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags &#xd83c;&#xdf89; ...

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags Databrags are glimpses into how Bricksters and community folks like you use data to solve everyday problems, e...

Data Engineering

598 Views
1 replies
3 kudos

03-16-2022 11:03:55 AM

View Replies

Latest Reply

Kaniz
Community Manager

08-04-2022 6:56:15 AM

3 kudos

@Lindsay Olson, Awesome!

3 kudos

08-04-2022 6:56:15 AM

by mick042 • New Contributor III

08-03-2022 11:58:42 PM

646 Views
1 replies
0 kudos

Does spark utilise a temporary stage when writing to snowflake? How does that work?

Folks , when I want to push data to snowflake I need to use a stage for files before copying data over. However, when I utilise the net.snowflake.spark.snowflake.Utils library and do a spark.write as in...spark.read.format("csv") .option("header", ...

Data Engineering

646 Views
1 replies
0 kudos

08-03-2022 11:58:42 PM

View Replies

Latest Reply

mick042
New Contributor III

08-04-2022 1:47:37 AM

0 kudos

Yes it uses a temporary stage. should have just looked in snowflake history

0 kudos

08-04-2022 1:47:37 AM

by Himanshi • New Contributor III

08-03-2022 11:46:38 PM

688 Views
0 replies
4 kudos

Databricks streaming job issue with Autoloader for new checkpoint.

Hi Team,I am trying to run a streaming job in databricks, used Autoloader approach for reading the files from the Azure Datalake Gen2 which is in parquet format. I have created a new checkpoint, so first offset is getting created but throwing an erro...

Data Engineering

688 Views
0 replies
4 kudos

08-03-2022 11:46:38 PM

by 165036 • New Contributor III

07-31-2022 8:43:40 PM

996 Views
3 replies
1 kudos

Resolved! Error message when editing schedule cron expression on job

When attempting to edit the schedule cron expression on one of our jobs we receive the following error message:Cluster validation error: Validation failed for spark_conf, spark.databricks.acl.dfAclsEnabled must be false (is "true") The spark.databric...

Data Engineering

996 Views
3 replies
1 kudos

07-31-2022 8:43:40 PM

View Replies

Latest Reply

165036
New Contributor III

08-03-2022 5:23:09 PM

1 kudos

FYI this was a temporary Databricks bug. Seems to be resolved now.

1 kudos

08-03-2022 5:23:09 PM

2 More Replies

by matty_f • New Contributor II

08-03-2022 1:43:13 PM

534 Views
0 replies
0 kudos

Help integrating streaming pipeline with AWS services S3 and Lambda

Hi there, I am trying to build a delta live tables pipeline that ingests gzip compressed archives as they're uploaded to S3. The archives contain 2 files in a proprietary format, and one is needed to determine how to parse the other. Once the file co...

Data Engineering

534 Views
0 replies
0 kudos

08-03-2022 1:43:13 PM

by AP • New Contributor III

07-31-2022 8:20:58 PM

2332 Views
5 replies
3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

Data Engineering

2332 Views
5 replies
3 kudos

07-31-2022 8:20:58 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-03-2022 11:09:30 AM

3 kudos

@AKSHAY PALLERLA Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens for jumping in, as always!

3 kudos

08-03-2022 11:09:30 AM

4 More Replies

by Jayesh • New Contributor III

08-02-2022 3:19:22 AM

1367 Views
5 replies
3 kudos

Resolved! How can we do data copy from Databricks SQL using notebook?

Hi Team, we have a scenario where we have to connect to the DataBricks SQL instance 1 from another DataBricks instance 2 using notebook or Azure Data Factory. Can you please help?

Data Engineering

1367 Views
5 replies
3 kudos

08-02-2022 3:19:22 AM

View Replies

Latest Reply

Anonymous
Not applicable

08-03-2022 11:07:58 AM

3 kudos

Thanks for jumping in to help @Arvind Ravish @Hubert Dudek and @Artem Sheiko !

3 kudos

08-03-2022 11:07:58 AM

4 More Replies

by Jeade • New Contributor II

08-03-2022 8:20:38 AM

1374 Views
3 replies
1 kudos

Resolved! Pulling data from Azure Boards into databricks

Looking for best practises/examples on how to pull data (epics, features, PBIs) from Azure Boards into databricks for analysis.Any ideas/help appreciated!

Data Engineering

1374 Views
3 replies
1 kudos

08-03-2022 8:20:38 AM

View Replies

Latest Reply

artsheiko
Valued Contributor III

08-03-2022 8:38:36 AM

1 kudos

you can use export to csv (link), push the file to the storage mounted to Databricks or just import the file obtained to dbfs

1 kudos

08-03-2022 8:38:36 AM

2 More Replies

by cralle • New Contributor II

08-01-2022 11:12:09 PM

3007 Views
7 replies
2 kudos

Resolved! Cannot display DataFrame when I filter by length

I have a DataFrame that I have created based on a couple of datasets and multiple operations. The DataFrame has multiple columns, one of which is a array of strings. But when I take the DataFrame and try to filter based upon the size of this array co...

Data Engineering

3007 Views
7 replies
2 kudos

08-01-2022 11:12:09 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

08-02-2022 2:57:34 AM

2 kudos

strange, works fine here. what version of databricks are you on?What you could do to identify the issue is to output the query plan (.explain).And also creating a new df for each transformation could help. Like that you can check step by step where...

2 kudos

08-02-2022 2:57:34 AM

6 More Replies

by tej1 • New Contributor III

05-16-2022 10:07:45 AM

1766 Views
6 replies
7 kudos

Resolved! Trouble accessing `_metadata` column using cloudFiles in Delta Live Tables

We are building a delta live pipeline where we ingest csv files in AWS S3 using cloudFiles. And it is necessary to access the file modification timestamp of the file. As documented here, we tried selecting `_metadata` column in a task in delta live p...

Data Engineering

1766 Views
6 replies
7 kudos

05-16-2022 10:07:45 AM

View Replies

Latest Reply

tej1
New Contributor III

08-03-2022 5:54:25 AM

7 kudos

Update: We were able to test `_metadata` column feature in DLT "preview" mode (which is DBR 11.0). Databricks doesn't recommend production workloads when using "preview" mode, but nevertheless, glad to be using this feature in DLT.

7 kudos

08-03-2022 5:54:25 AM

5 More Replies

by alexgv12 • New Contributor III

04-28-2022 2:34:24 PM

1345 Views
2 replies
3 kudos

delta table separate gold zone by different tenant

Hello, currently we have a process that builds with delta table the zones of bronze, silver and when it reaches gold we must create specific zones for each client because the schema changes, for this we create databases and separate tables, but when ...

Data Engineering

1345 Views
2 replies
3 kudos

04-28-2022 2:34:24 PM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

08-01-2022 10:20:33 PM

3 kudos

Hi @alexander grajales vanegas Are you creating all the databases and tables in gold zone manually?If so, please check out DLT https://docs.databricks.com/data-engineering/delta-live-tables/index.html, it will take care of your complete pipeline by ...

3 kudos

08-01-2022 10:20:33 PM

1 More Replies

by GKKarthi • New Contributor

01-27-2022 2:29:09 AM

2819 Views
7 replies
2 kudos

Resolved! Databricks - Simba SparkJDBCDriver 500550 exception

We have a Denodo big data platform hosted on Databricks. Recently we have been facing the exception with message '[Simba][SparkJDBCDriver](500550)' with the Databricks which interrupts the Databricks connection after the certain time Interval usuall...

Data Engineering

2819 Views
7 replies
2 kudos

01-27-2022 2:29:09 AM

View Replies

Latest Reply

PFBOLIVEIRA
New Contributor II

08-02-2022 11:10:27 AM

2 kudos

Hi All,We are also experiencing the same behavior:[Simba][SimbaSparkJDBCDriver] (500550) The next rowset buffer is already marked as consumed. The fetch thread might have terminated unexpectedly. Foreground thread ID: xxxx. Background thread ID: yyyy...

2 kudos

08-02-2022 11:10:27 AM

6 More Replies

by pankaj92 • New Contributor II

08-01-2021 10:26:02 AM

2764 Views
4 replies
0 kudos

extract latest files from ADLS Gen2 mount point in databricks using pyspark

Hi Team,I am trying to get the latest files from an ADLS mount point directory. I am not sure how to extract latest files ,Last modified Date using Pyspark from ADLS Gen2 storage account. Please let me know asap. Thanks! I am looking forward your re...

Data Engineering

2764 Views
4 replies
0 kudos

08-01-2021 10:26:02 AM

View Replies

Latest Reply

Sha_1890
New Contributor III

08-02-2022 7:00:57 AM

0 kudos

Hi @pankaj92 ,I wrote a Python code to pick a latest file from mnt location ,import ospath = "/dbfs/mnt/xxxx"filelist=[]for file_item in os.listdir(path): filelist.append(file_item)file=len(filelist)print(filelist[file-1])Thanks

0 kudos

08-02-2022 7:00:57 AM

3 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator

Resolved! Spark Sql Connector :

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags &#xd83c;&#xdf89; ...

Does spark utilise a temporary stage when writing to snowflake? How does that work?

Databricks streaming job issue with Autoloader for new checkpoint.

Resolved! Error message when editing schedule cron expression on job

Help integrating streaming pipeline with AWS services S3 and Lambda

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

Resolved! How can we do data copy from Databricks SQL using notebook?

Resolved! Pulling data from Azure Boards into databricks

Resolved! Cannot display DataFrame when I filter by length

Resolved! Trouble accessing `_metadata` column using cloudFiles in Delta Live Tables

delta table separate gold zone by different tenant

Resolved! Databricks - Simba SparkJDBCDriver 500550 exception

extract latest files from ADLS Gen2 mount point in databricks using pyspark

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...