cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

parthsalvi
by Contributor
  • 3338 Views
  • 1 replies
  • 2 kudos

Amazon SES : boto3 credentials not found. DBR 11.2 Shared mode

We're trying to send email using Amazon SES using boto3.client in python. We've added SES Full access in clusters IAM Role.   We were able to send email in "No isolation shared" mode in DBR 11.2 using ses = boto3.client('ses', region_name='us-****-2'...

image
  • 3338 Views
  • 1 replies
  • 2 kudos
Latest Reply
JameDavi_51481
Contributor
  • 2 kudos

This appears to be an intentional design choice to prevent users from using the credentials of the host machine to carry out arbitrary AWS API calls. I really wish there was a workaround or setting to disable this behavior because we put a lot of wor...

  • 2 kudos
Henrik
by New Contributor III
  • 1723 Views
  • 1 replies
  • 1 kudos

Resolved! Run notebooks on serverless SQL cluster

Is it just me or i'm I right that we  can't run notebooks on a serverless SQL cluster?It would be a nice feature for SQL based notebooks.

  • 1723 Views
  • 1 replies
  • 1 kudos
Latest Reply
Henrik
New Contributor III
  • 1 kudos

I figured out.I needed to start the cluster first.

  • 1 kudos
Sinthiya
by New Contributor II
  • 2113 Views
  • 1 replies
  • 1 kudos

Multiple streaming sources to the single delta live table

In our case, we have multiple sources writing to the same target table.  A target table can be populated from multiple source tables, each contributing a set of fields. How to add/update columns in a target table from multiple sources.In a delta live...

  • 2113 Views
  • 1 replies
  • 1 kudos
Latest Reply
SaiKiranGajjala
New Contributor II
  • 1 kudos

Following.

  • 1 kudos
Mr_K
by New Contributor
  • 10056 Views
  • 2 replies
  • 2 kudos

AnalysisException: [UC_COMMAND_NOT_SUPPORTED] Spark higher-order functions are not supported in Unity Catalog.;

Hello,forecast_date = '2017-12-01' spark.conf.set('spark.sql.shuffle.partitions', 500 ) # generate forecast for this data forecasts = ( history .where(history.date < forecast_date) # limit training data to prior to our forecast date .groupBy...

  • 10056 Views
  • 2 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

@Mr_K ApplyInPandas is a higher order function in Python. As of now, we do not support higher order functions in Unity Catalog. We do support direct calls made to python UDFs. Here is an example of how to reference UDFs in UC - https://docs.databrick...

  • 2 kudos
1 More Replies
schnee1
by New Contributor III
  • 9579 Views
  • 8 replies
  • 0 kudos

Access struct elements inside dataframe?

I have JSON data set that contains a price in a string like "USD 5.00". I'd like to convert the numeric portion to a Double to use in an MLLIB LabeledPoint, and have managed to split the price string into an array of string. The below creates a data...

  • 9579 Views
  • 8 replies
  • 0 kudos
Latest Reply
goldentriangle
New Contributor II
  • 0 kudos

Thanks, Golden Triangle Tour

  • 0 kudos
7 More Replies
MattM
by New Contributor III
  • 7360 Views
  • 8 replies
  • 2 kudos

Resolved! Access Databricks Delta table using SSRS without copying data to AzureSQL

We have our BI facts and dimensions built in as delta table in Datarbicks env and is being used for reporting by connecting PowerBI reports using datarbricks connection. We now have a need to use this data for another application utilizing SSRS repor...

  • 7360 Views
  • 8 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

https://buyusasmm.com/product/buy-google-5-star-reviews/

  • 2 kudos
7 More Replies
kll
by New Contributor III
  • 17951 Views
  • 3 replies
  • 0 kudos

python multiprocessing and the Databricks Architecture - under the hood.

I am curious what is going on under-the-hood when using `multiprocessing` module to parallelize an function call and apply it to a Pandas DataFrame along the row axis. Specifically, how does it work with DataBricks Architecture / Compute. My cluster ...

  • 17951 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Keval Shah​ :When using the multiprocessing module in Python to parallelize a function call and apply it to a Pandas DataFrame along the row axis, the following happens under the hood:The Pool object is created with the specified number of processes...

  • 0 kudos
2 More Replies
DineshKumar
by New Contributor III
  • 30433 Views
  • 5 replies
  • 2 kudos

Spark Read CSV doesn't preserve the double quotes while reading!

Hi , I am trying to read a csv file with one column has double quotes like below. James,Butt,"Benton, John B Jr",6649 N Blue Gum St Josephine,Darakjy,"Chanay, Jeffrey A Esq",4 B Blue Ridge Blvd Art,Venere,"Chemel, James L Cpa",8 W Cerritos Ave #54...

  • 30433 Views
  • 5 replies
  • 2 kudos
Latest Reply
LearningAj
New Contributor II
  • 2 kudos

Hi Team,I am also facing same issue and i have applied all the option mentioned from above posts:I will just post my dataset here:Attached is the my input data with 3 different column out of which comment column contains text value with double quotes...

  • 2 kudos
4 More Replies
Ruby8376
by Valued Contributor
  • 10667 Views
  • 11 replies
  • 5 kudos

Resolved! Streaming data from delta table to eventhub after merging data - getting timeout error!!

Here is my code to write data from a delta table to event hub (from where consumer team will consume data):import org.apache.spark.eventhubs._ import org.apache.spark.sql.streaming.Trigger._ import org.apache.spark.sql.types._ impor...

Data Engineering
Azure event hub
spark
streaming job
  • 10667 Views
  • 11 replies
  • 5 kudos
Latest Reply
Ruby8376
Valued Contributor
  • 5 kudos

Thank you @-werners- 

  • 5 kudos
10 More Replies
cmditch
by New Contributor II
  • 1563 Views
  • 1 replies
  • 0 kudos

Spark UI in GCP is broken

This seems to only be affecting single-node clusters in GCP and not multi-node clusters. I'm seeing 403 responses for all the css/js assets, among other things. I have not encountered this issue in an Azure workspace I have access to.My cluster is ru...

  • 1563 Views
  • 1 replies
  • 0 kudos
Latest Reply
cmditch
New Contributor II
  • 0 kudos

This is still a problem and make single node clusters very difficult to use atm.See attached photo of what the UI looks like

  • 0 kudos
Venkat_335
by New Contributor II
  • 1741 Views
  • 1 replies
  • 1 kudos

ISO-8859-1 encode not giving expected result using pyspark

I used ISO-8859-1 codepage to read the some special characters like A.P. MØLLER - MÆRSK A/S usinh pypsark. But the output is not coming as expected and getting output like this A.P. M?LLER - M?RSK A/S. Can some one help to resolve it.

  • 1741 Views
  • 1 replies
  • 1 kudos
Latest Reply
saipujari_spark
Databricks Employee
  • 1 kudos

@Venkat_335 I am not able to reproduce the issue. Please let me know which DBR you are using. It works fine with DBR 12.2 without mentioning the ISO-8859-1

  • 1 kudos
Luu
by New Contributor III
  • 6195 Views
  • 5 replies
  • 3 kudos

OPTIMZE ZOrder does not have an effect

Hi all,recently I am facing a strange behaviour after an OPTIMZE ZOrder command. For a large table around (400 mio. rows) I executed the OPTIMIZE command with ZOrder for 3 columns. However, it seems that the command does not have any effect and the c...

  • 6195 Views
  • 5 replies
  • 3 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 3 kudos

There are several potential reasons why your OPTIMIZE ZORDER command may not have had any effect on your table:The existing data files may already be optimally sorted based on the ZOrder and/or column ordering.If the data is already optimized based o...

  • 3 kudos
4 More Replies
Ank
by New Contributor II
  • 10273 Views
  • 5 replies
  • 6 kudos

Why am I getting NameError name ' ' is not defined in another cell?

I defined a dictionary variable Dict, populated it, and print(dict) in the first cell of my notebook. In the next cell, I executed the command print(dict) again. However, this time it gave me an error NameError: name 'Dict is not definedHow can that ...

  • 10273 Views
  • 5 replies
  • 6 kudos
Latest Reply
erigaud
Honored Contributor
  • 6 kudos

Running pip install restarts the interpreter, meaning that any variable defined prior to the pip install is lost, so indeed the solution is so run the pip install first, or better is to add the library you want to installl directly to the cluster con...

  • 6 kudos
4 More Replies
AChang
by New Contributor III
  • 1915 Views
  • 1 replies
  • 0 kudos

Best Cluster Setup for intensive transformation workload

I have a pyspark dataframe, 61k rows, 3 columns, one of which is a string column which has a max length of 4k. I'm doing about 100 different regexp_replace functions on this dataframe, so, very resource intensive. I'm trying to write this to a delta ...

Data Engineering
cluster
ETL
regex
  • 1915 Views
  • 1 replies
  • 0 kudos
Latest Reply
Leonardo
New Contributor III
  • 0 kudos

It seems that you're trying to apply a lot of transformations, but it's basic stuff, so I'd go for the best practices documentation and find a way to create a compute-optimized cluster.Ref.: https://docs.databricks.com/en/clusters/cluster-config-best...

  • 0 kudos
AryaMa
by New Contributor III
  • 32519 Views
  • 13 replies
  • 8 kudos

Resolved! reading data from url using spark

reading data form url using spark ,community edition ,got a path related error ,any suggestions please ? url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv" from pyspark import SparkFiles spark.sparkContext.addFil...

  • 32519 Views
  • 13 replies
  • 8 kudos
Latest Reply
padang
New Contributor II
  • 8 kudos

Sorry, bringing this back up...​from pyspark import SparkFiles url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv" spark.sparkContext.addFile(url) df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSc...

  • 8 kudos
12 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels