Data Engineering

Forum Posts

Sorted by:

by amelia1 • New Contributor II

a month ago

381 Views
1 replies
0 kudos

pyspark read data using jdbc url returns column names only

Hello,I have a remote azure sql warehouse serverless instance that I can access using databricks-sql-connector. I can read/write/update tables no problem.But, I'm also trying to read/write/update tables using local pyspark + jdbc drivers. But when I ...

Data Engineering

381 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

anardinelli
New Contributor III

a month ago

0 kudos

Hi @amelia1 how are you? What you got was indeed the top 5 rows (see that it was the Row class). What does it show when you run display(df)? I'm thinking it might be something related to your schema, since you did not defined that, it can read the da...

0 kudos

a month ago

by TWib • New Contributor III

05-24-2024 3:40:00 AM

1903 Views
7 replies
3 kudos

DatabricksSession broken for 15.1

This code fails with exception:[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ---->...

Data Engineering

1903 Views
7 replies
3 kudos

05-24-2024 3:40:00 AM

View Replies

Latest Reply

jcap
New Contributor II

a month ago

3 kudos

We are also seeing this error in 14.3 LTS from a simple example:from pyspark.sql.functions import coldf = spark.table('things')things = df.select(col('thing_id')).collect()[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.

3 kudos

a month ago

6 More Replies

by gianni77 • New Contributor

07-29-2015 4:19:38 AM

41859 Views
13 replies
4 kudos

How can I export a result of a SQL query from a databricks notebook?

The "Download CSV" button in the notebook seems to work only for results <=1000 entries. How can I export larger result-sets as CSV?

Data Engineering

41859 Views
13 replies
4 kudos

07-29-2015 4:19:38 AM

View Replies

Latest Reply

igorstar
New Contributor III

a month ago

4 kudos

If you have a large dataset, you might want to export it to a bucket in parquet format from your notebook:%python df = spark.sql("select * from your_table_name") df.write.parquet(your_s3_path)

4 kudos

a month ago

12 More Replies

by Mits • New Contributor II

05-31-2023 12:31:15 PM

1557 Views
4 replies
3 kudos

Sending email alerts to non-databricks user

I am trying to send email alerts to a non databricks user. I am using Alerts feature available in SQL. Can someone help me with the steps.Do I first need to first add Notification Destination through Admin settings and then use this newly added desti...

Data Engineering

1557 Views
4 replies
3 kudos

05-31-2023 12:31:15 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-14-2023 12:12:36 AM

3 kudos

Hi @Mitali Lad Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

3 kudos

06-14-2023 12:12:36 AM

3 More Replies

by Phani1 • Valued Contributor

a month ago

259 Views
1 replies
0 kudos

integrating Azure Databricks with AAD

Hi Team, Could you please provide the details/process for integrating Azure Databricks - Unity Catalog and AAD? Regards,Phani

Data Engineering

delta

259 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

raphaelblg
Contributor III

a month ago

0 kudos

Hello @Phani1 ,These doc pages might be useful for you: Set up and manage Unity CatalogSync users and groups from Microsoft Entra ID

0 kudos

a month ago

by ismaelhenzel • New Contributor III

a month ago

266 Views
1 replies
0 kudos

Upsert into a Delta Lake table with merge when using row masking function

I'm using databricks rls functions on my tables, and i need to make some merges into, but tables with rls functions does not support merge operations (https://docs.databricks.com/en/data-governance/unity-catalog/row-and-column-filters.html#limitation...

Data Engineering

266 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

raphaelblg
Contributor III

a month ago

0 kudos

Hi @ismaelhenzel, if you want to use the "MERGE INTO" sql command, you must turn-off rls. This is by design.

0 kudos

a month ago

by SamarthJain • New Contributor II

11-24-2021 1:05:05 AM

3751 Views
5 replies
2 kudos

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your...

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your help here to understand what happens internally at the "Stream Initializing" phase of the Spark Streaming job tha...

Data Engineering

3751 Views
5 replies
2 kudos

11-24-2021 1:05:05 AM

View Replies

Latest Reply

olivier_soucy
New Contributor II

a month ago

2 kudos

I also had the same issue, but it seems only happening on DBR >= 15.0. Any idea why?

2 kudos

a month ago

4 More Replies

by Mathias • New Contributor II

a month ago

193 Views
1 replies
0 kudos

Delay rows coming into DLT pipeline

Backgroundand requirements: We are reading data from our factory and storing it in a DLT table called telemetry with columns sensorid, timestamp and value. We need to get rows where sensorid is “qrreader-x” and join with some other data from that sam...

Data Engineering

193 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

raphaelblg
Contributor III

a month ago

0 kudos

Hi @Mathias, I'd say that watermarking might be a good solution for your use case. Please check Control late data threshold with multiple watermark policy in Structured Streaming. If you want to dig-in further there's also: Spark Structured Streami...

0 kudos

a month ago

by EcuaCrisCar • New Contributor III

07-03-2023 10:37:10 AM

607 Views
1 replies
0 kudos

Sending a personalized message to email.

Greetings community, I am new to using databricks and for some time I have tried some scripts in notebook. I would like your help on a task: Carry out a personalized mailing where, First, a query of the number of records in the test table is performe...

Data Engineering

SENDEMAIL SQL

607 Views
1 replies
0 kudos

07-03-2023 10:37:10 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a month ago

0 kudos

Hi @EcuaCrisCar, To query the number of records in your test table, you can use SQL or DataFrame APIs in Databricks.Next, you’ll need to check if the record count falls within the specified range (80,000 to 90,000). If it does, proceed with the note...

0 kudos

a month ago

by filipjankovic • New Contributor

07-10-2023 2:06:14 AM

2653 Views
1 replies
0 kudos

JSON string object with nested Array and Struct column to dataframe in pyspark

I am trying to convert JSON string stored in variable into spark dataframe without specifying schema, because I have a big number of different tables, so it has to be dynamically. I managed to do it with sc.parallelize, but since we are moving to Uni...

Data Engineering

2653 Views
1 replies
0 kudos

07-10-2023 2:06:14 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a month ago

0 kudos

Hi @filipjankovic, Since you have multiple tables and need dynamic schema inference, I recommend using the following approach: Schema Inference from JSON String: You can infer the schema from the JSON string and then create a DataFrame. Schema I...

0 kudos

a month ago

by NikhilK1998 • New Contributor II

07-05-2023 3:19:20 PM

1009 Views
1 replies
1 kudos

DataBricks Certification Exam Got Suspended. Require support for the same.

Hi,I applied for Databricks Certified: Data Engineer Professional certification on 5th July 2023. The test was going fine for me but suddenly there was an alert from the system (I think I was in proper angle in front of camera and was genuinely givin...

Data Engineering

1009 Views
1 replies
1 kudos

07-05-2023 3:19:20 PM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a month ago

1 kudos

Hi @NikhilK1998, I'm sorry to hear your exam was suspended. Thank you for filing a ticket with our support team. Please allow the support team 24-48 hours to resolve. In the meantime, you can review the following documentation: Room requirements Beh...

1 kudos

a month ago

by Avinash_Narala • New Contributor III

05-23-2024 2:29:38 AM

399 Views
1 replies
0 kudos

Instance profile failure while installing Databricks Overwatch

Despite following the steps mentioned in the provided link to create an instance profile, we encountered a problem in step 6 where we couldn't successfully add the instance profile to Databricks(Step 6: Add the instance profile to Databricks).https:/...

Data Engineering

overwatch

399 Views
1 replies
0 kudos

05-23-2024 2:29:38 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a month ago

0 kudos

Hi @Avinash_Narala, The error message you provided indicates that the verification of the instance profile failed due to an AWS authorization issue. Specifically, the user associated with the assumed role arn:aws:sts::755231362028:assumed-role/databr...

0 kudos

a month ago

by MiBjorn • New Contributor II

a month ago

384 Views
2 replies
1 kudos

Optimizing Data Insertion Speed for JSON Files in DLT Pipeline

Background:I'm working on a data pipeline to insert JSON files as quickly as possible. Here are the details of my setup: File Size: 1.5 - 2 kB eachFile Volume: Approximately 30,000 files per hourPipeline: Using Databricks Delta Live Tables (DLT) in c...

Data Engineering

384 Views
2 replies
1 kudos

a month ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a month ago

1 kudos

Hi @MiBjorn, Confirm that you're using the appropriate DLT product edition (Core, Pro, or Advanced) based on your workload requirements1.You'll receive an error message if your pipeline includes features that are not supported by the selected edit...

1 kudos

a month ago

1 More Replies

by sukanya09 • New Contributor

a month ago

310 Views
1 replies
0 kudos

Photon is not supported for a query

(1) LocalTableScan Output [11]: [path#23524, partitionValues#23525, size#23526L, modificationTime#23527L, dataChange#23528, stats#23529, tags#23530, deletionVector#23531, baseRowId#23532L, defaultRowCommitVersion#23533L, clusteringProvider#23534] Arg...

Data Engineering

Databricks

MERGE

Photon

310 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a month ago

0 kudos

Hi @sukanya09, The query you provided includes a LocalTableScan node, which Photon does not fully support.The specific node you mentioned has several attributes, such as path, partitionValues, size, modificationTime, and more.Unfortunately, Photon e...

0 kudos

a month ago

by prith • New Contributor III

a month ago

858 Views
6 replies
1 kudos

Resolved! Datbricks JDK 17 upgrade error

We tried upgrading to JDK 17Using Spark version 3.0.5 and runtime 14.3 LTSGetting this exception using parallelstream()With Java 17 I am not able to parallel process different partitions at the same time. This means when there is more than 1 partiti...

Data Engineering

858 Views
6 replies
1 kudos

a month ago

View Replies

Latest Reply

prith
New Contributor III

a month ago

1 kudos

Anyways - thanks for your response - We found a workaround for this error and JDK 17 is actually working - it appears faster than JDK 8

1 kudos

a month ago

5 More Replies

User

Count

1602

738

348

285

247

Databricks Community

Forum Posts

pyspark read data using jdbc url returns column names only

DatabricksSession broken for 15.1

How can I export a result of a SQL query from a databricks notebook?

Sending email alerts to non-databricks user

integrating Azure Databricks with AAD

Upsert into a Delta Lake table with merge when using row masking function

Hi All,I'm facing an issue with my Spark Streaming Job. It gets stuck in the "Stream Initializing" phase for more than 3 hours.Need your...

Delay rows coming into DLT pipeline

Sending a personalized message to email.

JSON string object with nested Array and Struct column to dataframe in pyspark

DataBricks Certification Exam Got Suspended. Require support for the same.

Instance profile failure while installing Databricks Overwatch

Optimizing Data Insertion Speed for JSON Files in DLT Pipeline

Photon is not supported for a query

Resolved! Datbricks JDK 17 upgrade error

Databricks with Private cloud

Pyspark serialization

Getting com.databricks.client.jdbc.Driver is not f...

Unit Testing DLT Pipelines

Retrieve job-level parameters in spark_python_task...