Topics with Label: Spark

Forum Posts

Sorted by:

by rt-slowth • Contributor

01-10-2024 6:33:50 PM

1066 Views
5 replies
2 kudos

AutoLoader File notification mode Configuration with AWS

from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/XXXXX/' S3_SCHEMA = 's3://datalake-lab/XXXXX/schemas/' ...

Data Engineering

1066 Views
5 replies
2 kudos

01-10-2024 6:33:50 PM

View Replies

Latest Reply

djhs
New Contributor III

2 weeks ago

2 kudos

Was this resolved? I run into the same issue

2 kudos

2 weeks ago

4 More Replies

by brianbraunstein • New Contributor II

a week ago

201 Views
1 replies
0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

Data Engineering

201 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

brianbraunstein
New Contributor II

a week ago

0 kudos

Ok, it looks like Databricks might have broken this functionality shortly after it came out: https://community.databricks.com/t5/data-engineering/parameterized-spark-sql-not-working/m-p/57969/highlight/true#M30972

0 kudos

a week ago

by Phani1 • Valued Contributor

2 weeks ago

332 Views
4 replies
0 kudos

Parallel execution of SQL cell in Databricks Notebooks

Hi Team,Please provide guidance on enabling SQL cells parallel execution in a notebook containing multiple SQL cells. Currently, when we execute notebook and all the SQL cells they run sequentially. I would appreciate assistance on how to execute th...

Data Engineering

delta

332 Views
4 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

a week ago

0 kudos

Hi @Phani1 Yes you can achieve this scenario with the help of Databricks Workflow jobs where you can create task and dependencies for each other.

0 kudos

a week ago

3 More Replies

by subha2 • New Contributor II

2 weeks ago

197 Views
2 replies
0 kudos

metadata driven DQ validation for multiple tables dynamically

There are multiple tables in the config/metadata table. These tables need to bevalidated for DQ rules.1.Natural Key / Business Key /Primary Key cannot be null orblank.2.Natural Key/Primary Key cannot be duplicate.3.Join columns missing values4.Busine...

Data Engineering

197 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @subha2, To dynamically validate the data quality (DQ) rules for tables configured in a metadata-driven system using PySpark, you can follow these steps: Define Metadata for Tables: First, create a metadata configuration that describes the rules ...

0 kudos

2 weeks ago

1 More Replies

by rt-slowth • Contributor

01-15-2024 12:07:53 AM

651 Views
6 replies
0 kudos

why the userIdentity is anonymous?

Do you know why the userIdentity is anonymous in AWS Cloudtail's logs even though I have specified an instance profile?

Data Engineering

651 Views
6 replies
0 kudos

01-15-2024 12:07:53 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-18-2024 1:27:52 AM

0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

0 kudos

01-18-2024 1:27:52 AM

5 More Replies

by GeorgHeiler • New Contributor III

10-01-2023 3:27:04 AM

4216 Views
9 replies
2 kudos

EDU discount for university research project

I need databricks for a university research project. Is there any possibility of EDU discounts on DBU? So far I was unable to reach out to Databricks sales. Can you connect me with someone from DB?

Data Engineering

4216 Views
9 replies
2 kudos

10-01-2023 3:27:04 AM

View Replies

Latest Reply

FeliciaWilliam
New Contributor III

a week ago

2 kudos

Thanks for good advices

2 kudos

a week ago

8 More Replies

by Tom_Greenwood • New Contributor III

02-01-2024 7:16:54 AM

2423 Views
9 replies
2 kudos

UDF importing from other modules

Hi community,I am using a pyspark udf. The function is being imported from a repo (in the repos section) and registered as a UDF in a the notebook. I am getting a PythonException error when the transformation is run. This is comming from the databric...

Data Engineering

2423 Views
9 replies
2 kudos

02-01-2024 7:16:54 AM

View Replies

Latest Reply

DennisB
New Contributor III

03-14-2024 4:04:03 AM

2 kudos

I was getting a similar error (full traceback below), and determined that it's related to this issue. Setting the env variables DATABRICKS_HOST and DATABRICKS_TOKEN as suggested in that Github issue resolved the problem for me (albeit it's not a grea...

2 kudos

03-14-2024 4:04:03 AM

8 More Replies

by shanebo425 • New Contributor II

2 weeks ago

185 Views
1 replies
0 kudos

Databricks OutOfMemory error on code that previously worked without issue

I have a notebook in Azure Databricks that does some transformations on a bronze tier table and inserts the transformed data into a silver tier table. This notebook is used to do an initial load of the data from our existing system into our new datal...

Data Engineering

185 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

jose_gonzalez
Moderator

2 weeks ago

0 kudos

Please review your Spark UI from the old job execution versus the new job execution. You might need to check if the data volume has increase and that could be the reason of the OOM

0 kudos

2 weeks ago

by Sagas • New Contributor II

2 weeks ago

154 Views
2 replies
0 kudos

SparkR or sparklyr not showing history

Hi,for some reason Azure Databricks doesn't show History if the data is saved with SparkR (2 in the figure below) or Sparklyr (3), but it does show it with Data Ingestion (0) or with PySpark (1). Is this a known bug or am I doing something wrong? Is ...

Data Engineering

sparklyr

SparkR

154 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Sagas, Let’s address your questions regarding Azure Databricks, SparkR, and Sparklyr. History in Azure Databricks: Each operation that modifies a Delta Lake table creates a new table version. You can use history information to audit operation...

0 kudos

2 weeks ago

1 More Replies

by Phani1 • Valued Contributor

2 weeks ago

117 Views
1 replies
1 kudos

Job cluster configuration for 24/7

Hi Team,We intend to activate the job cluster around the clock. We consider the following parameters regarding cost: - Data volumes - Client SLA for job completion- Starting with a small cluster configuration Please advise on any other options we s...

Data Engineering

117 Views
1 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

1 kudos

Hi @Phani1, When configuring a job cluster for 24/7 operation, it’s essential to consider cost, performance, and scalability. Here are some recommendations based on your specified parameters: Data Volumes: Analyze your data volumes carefully. If...

1 kudos

2 weeks ago

by NOOR_BASHASHAIK • Contributor

03-12-2024 11:33:31 AM

483 Views
3 replies
0 kudos

Machine Type for VACUUM operation

Dear allI have a workflow with 2 tasks : one that does OPTIMIZE, followed by one that does VACUUM. I used a cluster with F32s driver and F64s - 8 workers (auto-scaling enabled). All 8 workers are launched by Databricks as soon as OPTIMIZE starts. As ...

Data Engineering

best practice

F series

optimize

vacuum

483 Views
3 replies
0 kudos

03-12-2024 11:33:31 AM

View Replies

Latest Reply

ArturOA
New Contributor II

2 weeks ago

0 kudos

Hi,were you able to get any useful help on this?

0 kudos

2 weeks ago

2 More Replies

by Paul92S • New Contributor III

02-20-2024 9:29:32 AM

1397 Views
2 replies
1 kudos

Resolved! DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Hi,I am having an issue of loading source data into a delta table/ unity catalog. The error we are recieving is the following:grpc_message:"[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull(\'metric_...

Data Engineering

1397 Views
2 replies
1 kudos

02-20-2024 9:29:32 AM

View Replies

Latest Reply

Palash01
Contributor III

02-20-2024 5:49:50 PM

1 kudos

Hey @Paul92S Looking at the error message it looks like column "metric_name" is the culprit here:Understanding the Error:Character Limit Violation: The error indicates that values in the metric_name column are exceeding the maximum length allowed fo...

1 kudos

02-20-2024 5:49:50 PM

1 More Replies

by Olaoye_Somide • New Contributor II

3 weeks ago

338 Views
1 replies
0 kudos

How to Implement Custom Logging in Databricks without Using _jvm Attribute with Spark Connect?

Hello Databricks Community,I am currently working in a Databricks environment and trying to set up custom logging using Log4j in a Python notebook. However, I've run into a problem due to the use of Spark Connect, which does not support the _jvm attr...

Data Engineering

Apache Spark

data engineering

338 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

arpit
Contributor III

3 weeks ago

0 kudos

import logging logging.getLogger().setLevel(logging.WARN) log = logging.getLogger("DATABRICKS-LOGGER") log.warning("Hello")

0 kudos

3 weeks ago

by Phani1 • Valued Contributor

3 weeks ago

197 Views
1 replies
0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

Data Engineering

delta

197 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @Phani1, Let’s explore the integration of Databricks with Boomi and compare it to Azure Event Hub. Databricks Integration with Boomi: Databricks is a powerful data analytics platform that allows you to process large-scale data and build machin...

0 kudos

3 weeks ago

by CarstenWeber • New Contributor II

3 weeks ago

356 Views
4 replies
1 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config: spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

Data Engineering

356 Views
4 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

CarstenWeber
New Contributor II

3 weeks ago

1 kudos

@daniel_sahal using the settings above did indeed work.

1 kudos

3 weeks ago

3 More Replies